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Abstract:- The growth of internet environment has also achieved to increase in end user suspicious activities. 
Every user gets connected to the network environment which growths unauthorized activities in the system. For 
protecting data from unauthorized activities or detecting intrusions, there is a necessity to implement security 
mechanism for identifying unauthorized probable sign of events. Intrusion Detection System (IDS) is used for 
finding the above activities. Intrusion detection is the process of intelligently monitoring the system activities 
for identifying the conceivable signs of attack. So the primary aim of Intrusion Detection Systems (IDS) is to 
protect the availability, confidentiality and integrity of network information systems. In this research classifier 
has been applied using Naive Bayes, Bagging, Boosting, Stacking, and J48 on five attack categories as found in 
the NSL-KDD dataset intrusion detection dataset for novelty attacks as well as for Original dataset and 
prepossessed dataset. It compares the performance of different classification algorithms which may be 
categorized into five broad attacks namely Normal, Probe, DoS, U2R, and R2L. 

Keywords:- Intrusion Detection System (IDS), Classification Techniques, Information Gain (IG), Network 
Attack, Evolution Metrics. 



I. INTRODUCTION 

Today the internet and computer system has become a part of daily life and an essential tool. It utilities 
people in many areas, such as business, education, medical, entertainment etc. The openness and scalability of 
the internet has made it a flexible platform for a new generation of on-line services, such as E-commerce, 
military, social network, public web services, stock prices, online shopping, online reservation etc. The 
popularity of these services has caused in a huge volume of financial transactions and other type of sensitive 
information being accessed via the internet. Internet has elevated numerous security issues due to the explosive 
use of network, the importance and value of this information and the related on-line services which has made the 
internet a board for a wide variety of attacks and threatens its security of the internet [5, 7]. 
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Figure 1 depicts the organization of IDS where solid lines indicate data/control flow while dash lines indicate 

response to intrusive activities [42]. 



1.1 Network Security 

Network security consists of the provision and policies adopted by a network administrator to prevent and 
monitor unauthorized access, misuse, modification or denial of a computer network and network accessible 
resources [43]. 

1.2 Intrusion and Intruder 

Intrusion means breaking in to a computer system or network and then misuses them and performs the 
malicious activities. When a user of an information system takes an action which the user is not legally 
permitted to perform is called Intrusion [43]. 
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Intruder is the person who breaks the computer system or network and misuses the computer system or 
network is called as an Intruder. There are two types of Intruders namely External Intruder and Internal Intruder 
[43]. 

External Intrusion comes from outside and cause damages to computer system or network. External 
intruders do not have any authorized access to the system they attack. Hackers are the example of external 
intruders [3]. Internal Intruder is an insider who exceeds his limited authority to take action. His action may or 
may not be harmful to the health of the system or the services provided by the system but it seeks to gain added 
ability to take action without authentic authorization. Internal intruders may act within or outside their limits of 
authorization [8, 19]. 

1.3 Intrusion Detection System (IDS) 

Intrusion Detection System is the procedure of monitoring and analysing the events occurring in a 
computer system in order to detect signs of security problems, a security measure that helps to identify a set of 
malicious actions that compromise the integrity, confidentiality and availability of information resources. 
Intrusion detection is a difficult problem because of the trade-off thought of detection accuracy, detection speed, 
the dynamic environment of the networks and the available processing power for processing high volumes of 
data from distributed networked systems [9]. 

1.4 Detection Methodologies 

Intrusion detection methodologies are classified in following three major categories: 

(i) Signature-based Detection (SD) 

(ii) Anomaly-based Detection (AD) 

1.4.1 Signature-based Detection (Knowledge-based) 

A signature-based detection (SD) is a pattern or string that corresponds to a known attack or threat. SD 
is the process to compare pattern against captured proceedings for recognizing likely intrusions. Because of 
using the knowledge accumulated by exact attacks and system vulnerabilities, SD is also known as Knowledge- 
based Detection or Misuse Detection [13, 22]. 

1.4.2 Anomaly-based Detection (Behavior-based) 

An Anomaly-based Detection (AD) is a deviation to a known behavior and profiles signify the normal 
or expected behaviors derived from monitoring regular activities, network connections, host or users over a 
period of time. Profile may be either statics or dynamic and usage, the count of e-mails sent etc. AD compares 
normal profiles with observed event to identify significant attacks. The examples of AD are attempted break-in, 
masquerading, penetration by legitimate user, Denial-of Service (DoS), Trojan horse, etc [6, 19]. 

1.5 Type of Intrusion Detection System 

There are several types of intrusion detection systems and the choice of which one to use depends on 
the inclusive risks to the organization and the resources available. One of the classifications of IDS is 
established by the resource they monitor. According to this classification, basically IDS divided into two 
categories. There are two types of IDS: Host-based Intrusion Detection System (HIDS) and Network -based 
Intrusion Detection System (NIDS). A HIDS resides on particular host and looks for indications of attacks on 
the host. A NIDS resides on a separate system that watches network traffic, looking for indications of attacks 
that traverse the specified part of the network [14,20]. 

1.5.1 Host Based Intrusion Detection System 

A Host Based Intrusion Detection System (HIDS) monitors the characteristics such as network traffic, 
system logs, running processes, application activity, file access and modification and system application 
configuration modification. This is most usually deployed on critical host such as publicly nearby servers and 
server containing sensitive information [14, 21]. An HIDS exists as a software process on a system. Usually 
HIDS systems have examined log entries for specific information. More recently, a new form of HIDS has been 
created that examines calls to the operating system kernel. This type of HIDS is programmed with known attack 
signature and will give alarm if a system call matches any of the signatures. Both types of HIDS are 
accomplished of checking file on the system for modification. This is done by execution a cryptographic 
checksum on the file using a hashing function such as MD5. This value is then stored and used as a comparison 
against periodic checksum of the file. If the checksum do not match, the file has been possibly altered and the 
HIDS will report this information [38]. 

1.5.2 Network Based Intrusion Detection System 

A Network Based Intrusion Detection System (NIDS) exists as a software process on a dedicated 
hardware system into promiscuous mode, which means that the card passes over all traffic on the network to the 
NIDS software. The traffic is then analyzed according to a set of rules and attack signatures to determine if it is 
so traffic of interest. If it is an event is generated. The most common configuration for an NIDS is to use two 
network interface cards. One card is used to monitor a network. This card is placed in a 'stealthy' mode so that it 
does not have an IP address and therefore, does not respond to incoming connections. The stealthy card does not 
have a protocol stack bound to it so that it cannot respond to analyses such as a ping. The second card is used to 
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communication with the IDS management system and to send alarms. This card attached to an internet network 
that is not visible to the network being monitored [42]. 

1.6 Networking Attacks 

The four main categories of networking attack are following see attacks on a network may comfortably 
be placed into one of these groupings. 

Denial of Service (DoS): A DoS attack is a type of attack in which the hacker makes a computing or memory 
resources too busy or too full to serve legitimate networking requests and hence repudiating users access to a 
machine e.g. apache, smurf, neptune, ping of death, mail bomb, UDP storm etc. are all DoS attacks [12, 44]. 
Remote to Local/User Attacks (R2L): Attackers does not have an account on the target machine, hence tries to 
gain access, these are guess_passwd, ftp_write, multihop, phf, spy, imap, warezclient and warezmaster. [12, 23]. 
User to Root Attacks (U2R): These attacks are misuses in which the hacker starts off on the system with a 
normal user account and attempts to abuse the vulnerabilities in the system in order to gain super user privileges 
e.g. perl, xterm etc [12]. 

Probing: Probing is an attack in which the hacker scans a machine or a networking device in order to determine 
weaknesses or vulnerabilities which may later be exploited so as to compromise with the system. This technique 
is commonly used in data mining e.g. saint, portsweep, mscan, nmap etc. [12] 

1.7 Classification Algorithms 

Classification is the problem of identifying which of a set of categories (sub-populations) a novel 
observation belongs, on the basis of a training dataset containing observations (or instances) whose category 
membership is known. This technique used to expect group membership for data instances [25]. 

1.7.1 Naive Bayes Classification 

Naive Bayes is a simple technique for classification using a simple probabilistic model from Bayes 
theorem with the assumptions of independent attributes. Naive Bayes is a type of supervised learning algorithm 
that uses a maximum likelihood method for parameter estimation. It requires a set of training data to estimate 
means and variances of the attributes for classification. 

The Naive Bayesian Classifier, or simple Bayesian classifier, works as follows: 

1. Let D be a training set of tuples and their associated class labels. As usual, each tuple is represented by an n- 
dimensional attribute vector, X = (x\, x 2 , .... , x„), depicting n measurements made on the tuple from n attributes, 
respectively A \,A%,... ,A n , 

2. Suppose that there are m classes, Ci, C2,..., C m . Given a tuple, X, the classifier will predict that X belongs to 
the class having the highest posterior probability, conditioned on X. That is, the naive Bayesian classifier 
predicts that tuple X belongs to the class Ci if and only if 

P{C t X) > P{Cj X) for I < j < m, j ± i. 
Thus maximize P(CIX). The class C for which P(CIX) is maximized is called the maximum posteriori 
hypothesis. 

3. As P(X) is constant for all classes, only P(XIC,)P(C,) need be maximized. If the class prior probabilities are 

not known, then it is commonly assumed that the classes are equally likely, that is, P(C\) = P(C 2 ) - = 

P(C m ), and would therefore maximize P(ZIC,). Otherwise, maximize P(ZIC,)P(C,). Note that the class prior 
probabilities may be estimated by equally likely, that is, P(C) = P(C 2 ) — •••• = P(C,„), and would therefore 
maximize P(ZjlC,). Otherwise, maximize P(ZIC,)P(C,). Note that the class prior probabilities may be estimated 
by P(Cj)=\Cij)\/\D\, where IC, B I is the number of training tuples of class C, in D. 

4. Given data sets with many attributes, it would be extremely computationally expensive to compute P(ZIC,). In 
order to reduce computation in evaluating P(X\C,), the naive assumption of class conditional independence is 
made. This presumes that the values of the attributes are conditionally independent of one another, given the 
class label of the tuple (i.e., that there are no dependence relationships among the attributes). Thus, 

p{X\c s ) = flr( Xk \cfi 

= P(jc, |CV) X P(X7 |d) x - ■ ■ x />(-v„ d). 

The probabilities P(xilC), P(x 2 IC), ,P(x„IQ) from the training tuples. Recall that here X* refers to the value of 

attribute A k for tuple X. 

1.7.2 Bagging Classifier 

Bagging, which means bootstrap aggregation, is one of the simplest but most successful ensemble 
methods for improving unstable classification problems. The bagging technique is very useful for large and 
high-dimensional data, such as intrusion data sets methods for improving unstable classification problems. 
Algorithm: Bagging. The bagging algorithm creates an ensemble of models (classifiers or predictors) for a 
learning scheme where each model gives an equally-weighted prediction. 
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Input: 

D, a set of d training tuples; 

k, the number of models in the ensemble; 

a learning scheme (e.g., decision tree algorithm, back propagation, etc.) 

Output: A composite model, M. 

Method: 

(1) for ; = 1 to A: do // create k models: 

(2) create bootstrap sample, Di, by sampling D with replacement; 

(3) use Di to derive a model, Mi; 

(4) end for 

To use the composite model on a tuple, X: 

(1) if classification then 

(2) let each of the k models classify X and return the majority vote; 

(3) if prediction then 

(4) let each of the k models predict a value for X and return the average predicted value. 

1.7.3 Boosting Classifier 

Boosting is an ensemble method for boosting the performance of a set of weak classifiers into a strong 
classifier. This technique can be viewed as a model averaging method and it was originally designed for 
classification, but it can also be applied to regression. Boosting provides sequential learning of the predictors. 
Algorithm: Adaboost: A boosting algorithm creates an ensemble of classifiers. Each one gives a weighted vote. 
Input: 

D, a set of d class-labeled training tuples; 

k, the number of rounds (one classifier is generated per round); 

a classification learning scheme. 

Output: A composite model. 

Method: 

(1) initialize the weight of each tuple in D to \=d; 

(2) for i= 1 to k do // for each round: 

(3) sample D with replacement according to the tuple weights to obtain Di; 

(4) use training set Di to derive a model, Mi; 

(5) compute error(Mi), the error rate of Mi 

(6) if error(Mi) > 0:5 then 

(7) reinitialize the weights to l=d 

(8) go back to step 3 and try again; 

(9) endif 

(10) for each tuple in Di that was correctly classified do 

(11) multiply the weight of the tuple by error(Mi)=(l-error(Mi)); II update weights 

(12) normalize the weight of each tuple; 

(13) endfor 

To use the composite model to classify tuple, X: 

(1) initialize weight of each class to 0; 

(2) for i = 1 to k do // for each classifier: 

(3) wi = log [l-error(Mi)/error(Mi)] ; II weight of the classifier's vote 

(4) c = Mi(X); II get class prediction for X from Mi 

(5) add wi to weight for class c 

(6) endfor 

(7) return the class with the largest weight. 

1.7.4 Stacking Classification 

Stacking is the abbreviation to refer to Stacked Generalization. Unlike bagging and boosting it uses 
different learning algorithms to generate the ensemble of classifiers. The main idea of stacking is classifiers 
from different learners such as decision trees, instance-based learners etc. Since each one uses different 
knowledge representation and different learning biases the theory space will be explored differently and 
different classifiers will be found. Thus, it is expected that they will not be correlated. 

When the classifiers have been generated they must be combined. Unlike bagging and boosting, 
stacking does not use a voting system because, for example, if the majority of the classifiers make evil 
predictions this will lead to a final bad classification. To resolve this problem stacking uses the concept of Meta 
learner [34]. One way to outputs is by voting the same mechanism used in bagging. However (unweight) voting 
only makes sense if the learning schemes perform comparably well. If two of the three classifiers make 
predictions that are completely incorrect, trouble instead stacking introduces the concept of a Meta learner, 
which replaces the voting procedure. The problem with voting is that it's not clear which classifier to trust. 
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Stacking tries to learn which classifiers are the reliable ones, using another learning algorithm the Meta learner 
to discover how best to combine the output of the base learners [25]. 

The input to the Meta model also called the level-1 model is the predictions of the base models, or 
level-0 models. A level-1 instance has as many attributes as there are level-0 learners, and the attribute values 
give the predictions of these learners on the corresponding level-0 instance. When the stacked learner is used for 
classification, an instance is first fed into the level-0 models, and each one guesses a class value. These guesses 
are fed into the level-1 model, which combines them into the final prediction. 
1.7.5 J48 Decision Trees Classification 

A decision tree is a predictive machine-learning model that decides the target value (dependent 
variable) of a new sample based on various attribute values of the available data. The internal nodes of a 
decision tree represent the different attributes the branches between the nodes tell the possible values that these 
attributes can have in the observed samples while the terminal nodes tell the final value (classification) of the 
dependent variable. 

The attribute that is to be predicted is known as the dependent variable since its value depends upon or 
is decided by the values of all the new attributes. The new attributes which help in predicting the value of the 
dependent variable are known as the independent variables in the dataset. 

The J48 Decision tree classifier follows the following simple algorithm. In order to classify a novel 
item it first needs to create a decision tree based on the attribute values of the obtainable training data. So, 
whenever it encounters a set of items (training set) it finds the attribute that discriminates the several instances 
most clearly. This feature that is able to tell us most nearby the data instances so that classify them the best is 
said to have the highest information gain [4]. Now, among the possible values of this feature, if there is any 
value for which there is no ambiguity that is for which the data instances falling within its category have the 
same value for the target variable then terminate that branch and allocate to it the target value that have obtained 
[25]. 

For the other cases, then look for another attribute that gives the highest information gain. Hence 
continue in this method until either gets a clear decision of what combination of attributes gives a specific target 
value, or run out of attributes. In the event that run out of attributes, or if cannot get an unambiguous result from 
the available information, assign this branch a target value that the majority of the items under this branch 
possess [36]. 

1.8 Feature Selection Algorithms. 

Attribute selection also known as feature selection is the process of selecting a subset of the terms 
occurring in the training set and using only this subset as features in text classification. Feature selection serves 
two main purposes [35]. 

1. It makes training and applying a classifier more efficient by decreasing the size of the effective 
vocabulary. 

2. Feature selection often increases classification accuracy by eliminating noise features (A noise feature 
is one that, when added to the document representation, increases the classification error on new data). 

1.8.1 Information Gain Attribute Ranking 

This is one of the simplest (and fastest) attribute ranking methods and is often used in text 
categorization applications where the sheer dimensionality of the data precludes more sophisticated attribute 
selection techniques [36]. 

If A is an attribute and C is the class, following equations given the entropy of the class before and after 

observing the attribute. 

H(C) = -Ip(c) log 2 (c), 

H (CIA) = -SP (a) SP (c|a) log 2 P (cla) 

The amount by which the entropy of the class decreases reflects the additional information about the 
class provided by the attribute and is called information gain. Each attributes Aj itself and the class: 
IGi=H(C)-H(CIAO 

= H(Aj) - H(A,IC) 

= H(A,) + H(C) - H(AC) 
Data sets with numeric attributes are first discretized using the method of Fayyad and Irani. 

II. LITERATURE SURVEY 

This section reviews the current literature and related work in the areas of intrusion detection systems 
concerning with different methods and technology through examination of various research papers, journals and 
online resources. 

Aruna Jamdagni et al. [1] proposed RePIDS and evaluated using DARPA 99 dataset and Georgia 
Institute of Technology attack dataset. The traffic for Web -based application is considered for validating our 
model. F-value a criterion is used to evaluate the detection performance of RePIDS. Experimental results show 
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that RePIDS achieves better performance (high F-values, 0.9958 for DARPA 99 dataset and 0.976 for Georgia 
Institute of Technology attack dataset respectively, with only 0.85% false alarm rate) and lower computational 
complexity when compared against two state-of-the-art payload-based intrusion detection systems. Additionally, 
it has 1.3 time higher throughput in comparison with real scenario of medium sized enterprise network. 

Ahmed Patel et al. [2] the latest developed Intrusion Detection Prevention System (IDPSs) and alarm 
management techniques by providing a comprehensive taxonomy and investigating possible solutions to detect 
and prevent intrusions in cloud computing systems. The desired characteristics of IDPS and cloud computing 
systems, a list of germane requirements are identified and four concepts of autonomic computing fuzzy theory, 
self-management risk management and ontology are leveraged to satisfy these requirements. 

Chung-Ming Ou [8] used Agent-based artificial immune system (ABAIS) is adapted to intrusion 
detection system (IDS). An Agent - Based IDS (ABIDS) inspired by the danger theory of human immune 
system is proposed. Multiple agents are entrenched to ABIDS where agents coordinate one another to calculate 
Mature Context Antigen Value (MCAV) and update activation threshold for security responses. The intelligence 
behind ABIDS is based on the danger theory and the functionalities of dendritic cells in human immune 
systems, while Dendritic Cells agents (DC agent) are emulated for innate immune subsystem and artificial T- 
Cell agents (TC agent) are for adaptive protected subsystem. Antigens are profiles of system calls while 
corresponding behaviours are regarded as signals. This ABIDS is based on the dual detections of DC agents for 
signals and TC agents for antigens. ABAIS is an intelligent system with learning technique and memory 
capabilities. According to MCAVs immune response to malicious behaviours is activated by either computer 
host or Security Operating Centre. Accordingly computer hosts met with malicious intrusions can be effectively 
detected by input signals and temporary output signals such as PAMP danger and safe signals. 

Chenfeng Vincent Zhou et al. [9] summarized the current research directions in detecting such attacks 
using collaborative intrusion detection systems (CIDSs). In particular highlight two main challenges in CIDS 
research. CIDS architectures and alert correlation algorithms. In this paper review the current CIDS approaches 
in terms of these two challenges conclude by highlighting opportunities for an integrated solution to large-scale 
collaborative intrusion detection. 

C. Kolias et al. [10] explored the reasons that led to the application of Swarm Intelligence (SI) in 
intrusion detection and present SI methods that have been used for constructing IDS. A main contribution is also 
a detailed comparison of several Si-based IDS in terms of efficiency. This gives a clear idea of which solution is 
more appropriate for each particular case. 

Dr. Saurabh and Neelam [11] suggested identifying important reduced input features in building IDS 
that is computationally efficient and effective. This paper investigates the performance of three standard feature 
selection method using correlation-based Feature Selection, Information Gain and Gain Ratio. In this paper 
propose method Feature Vitality Based Reduction Method, to identify important reduced input features. 

D. Mutz et al. [13] argue that most hybrid systems obtain high false alarm rates due to simplistic 
approaches to combining the outputs of the techniques in the decision phase. They propose a hybrid host based 
anomaly detection system consisting of four detection techniques: analysing string length, character distribution, 
and structure, and identifying learned tokens, in which a Bayesian network is employed to decide the final 
output classification. The system was validated on the DARPA99 dataset, compared with a simple threshold 
based approach. Both approaches (Bayesian and threshold) were given the same outputs from the detection 
techniques. With 90% true positives, the threshold based approach lead to twice as many false positives as the 
Bayesian network. 

Guorui Li et al. [17] proposed a distributed group-based intrusion detection scheme that meets all the 
above requirements by partitioning the sensor networks into many groups in which the sensors in each group are 
physically close to each other and are equipped with the same sensing capability. Intrusion detection algorithm 
takes simultaneously into consideration of multiple attributes of the sensor nodes to detect malicious attackers 
precisely. In this paper show through experiments with real data that our algorithm can decrease the false alarm 
rate and increase the detection accuracy compared with existing intrusion detection schemes while lowering the 
computation and transmission power consumption. 

Hung-Jen Liao et al. [18] proposed declared that an Intrusion Detection System (IDSs) has received a 
lot of attention throughout the computer science field. Existing IDSs pose challenges on not only capricious 
intrusion categories, but also huge computational authority. Though there is a number of existing literatures to 
IDS issues, in this paper show attempt to give a more elaborate image for a comprehensive review. Through the 
extensive survey and sophisticated organization, propose the taxonomy to outline modern IDSs. 

Levent Koc et al. [27] used technique such as pattern recognition and the data mining of network 
events are often used by intrusion detection system to classify the network event as either normal events or 
attack events. In this research paper study claims that the Hidden Naive Bayes (HNB) model can be applied to 
intrusion detection problems that suffer from dimensionality extremely correlated features and high network 
data stream volumes. HNB is a data mining model that relaxes the Naive Bayes method's conditional 
independence assumption. This paper experimental result show that the HBN model exhibits a superior overall 
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performance in terms of accuracy, error rate and misclassification cast compared with the traditional Naive 
Bayes model, leading extended Naive Bayes model and the Knowledge Discovery and Data Mining (KDD) cup 
1999 winner. 

M.Sabhnani and Serpen [29] have examined the performance of several machine learning techniques 
on the KDD Cup 99 dataset, including a C4.5 DT. The DT obtained good accuracy, but does not perform as well 
as other techniques on some classes of intrusion, particularly U2R and R2L attacks, both of which are minor 
classes and include a large proportion of new attack types. An ANN and K-Means clustering obtained higher 
detection rates on these classes, which are two techniques that are better able to generalize from learned data to 
new, unseen, data. 

Manasi Gyanchandani et al. [36] in this paper evaluates the performance of C4.5 classifier and its 
combination using bagging, boosting and stacking over NSL KDD dataset for Intrusion Detection System. This 
dataset usual consists of selected records of the complete KDD dataset. 

N. Ben Amor et al. [37] conducted an empirical investigation on the KDD Cup 99 dataset, comparing 
the performance of NB and a Decision Tree (DT). The DT obtains a higher accuracy (92.28% compared with 
91.47%), but NB obtains better detection rates on the three minor classes 1, namely Probing, U2R and R2L 
intrusions. Most significantly, the DT detects merely 0.52% R2L intrusions whilst NB detects 7.1 1%. 

P. Garcia Teodoro et al. [39] described security tools incorporating anomaly detection functionalities 
are just starting to appear, and several importance problems remain to be solved. This paper in the most well- 
known anomaly-based intrusion detection techniques platforms systems under development and research 
projects in the area presented. Finally the main challenges to be dealt with for the wide scale deployment of 
anomaly-based intrusion detectors with special emphasis on assessment issues. 

Phurivite Sangkatsanee et al. [40] proposed a real-time intrusion detection approach using a supervised 
machine learning technique. Authors approach is simple and efficient and can be used with many machine 
learning techniques. In this paper used applied different well-known machine learning techniques to evaluate the 
performance of IDS approach and experimental result show that the Decision Tree technique can outperform the 
other techniques. This research paper also identified 12 essential features of network data which are relevant to 
detecting network attack using the information gain as feature selection criterions and developed a new post- 
processing processing procedure to reduce the false-alarm rate as well as increase the reliability and detection 
accuracy of the intrusion detection system. 

Simon T. Powers et al. [41] evaluated a hybrid system specifically anomalous network connections are 
initially detected using artificial immune system connections that are flagged as anomalous are then categorised 
using a Kohonen Self Organising Map allowing higher-level information in the form of cluster membership to 
be removed. Experimental results on the KDD 1999 cup dataset show a low false positive rate and a detection 
and classification rate for Denial of Service (DoC) and User to Root (U2R) attacks that is higher than those in a 
sample of other works. 

Sanjay Rawat et al. [42] presented study of investigate the applicability of Spectral Analysis technique 
Singular Value Decomposition (SVD) as a pre-processing step to reduce the dimensionality of the data. This 
reduction highlights the most prominent features in the data by eliminating the noise. This pre-processing step 
not only makes the data noise-free but also reduces the dimensionality of the data thereby minimizing 
computational time. This research paper proposed technique can be applied to other existing methods to 
improve their performance. Perform experiments on various data sets like DARPA 98, UNM send mail, inetd, 
and login-ps data sets to show that reduction in the dimension of the data does not degrade the performance of 
the IDS. In fact in case of single application observing like send mail, by applying reduction techniques get very 
encouraging results. 

S. Peddabachigari et al. [43] conducted an empirical investigation of SVMs and DTs, in which they 
analyzed their performance as standalone detectors and as hybrids. In this paper two hybrid models were 
examined a hierarchical model (DT-SVM) with the DT as the first layer, to produce node information for the 
SVM in the second layer, and an ensemble model comprising the standalone techniques and the hierarchical 
hybrid. For the ensemble approach each technique is given a weight according to detection rate of each 
particular attack type during training. 

Yuk Ying Chung et al. [45] proposed a new hybrid intrusion detection system by using Intelligent 
Dynamic Swarm based Rough Set (IDS-RS) for feature selection and simplified swarm optimization for 
intrusion data classification. It is proposed to select the most relevant features that can represent the pattern of 
the network traffic. Improve the performance of SSO classifier a new Weighted Local Search (WLS) strategy 
incorporated in Simplified Swarm Optimization (SSO)is proposed. The purpose of this new local search strategy 
is to discover the better solution from the neighbourhood of the current solution produced by SSO. The 
performance of the proposed hybrid system on KDD Cup 99 dataset has been evaluated by comparing it with 
the standard Particle Swarm Optimization (PSO) and two other most popular standard classifiers. The testing 
results showed that the proposed hybrid system can achieve higher classification accuracy than others with 
93.3% and it can be one of the competitive classifier for the intrusion detection system. 
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III. PROPOSED METHODOLOGY 

In intrusion detection system, it is essential to perform better for unknown attack. This work evaluates 
the performance of various classification algorithms for the test dataset of novelty attacks as well as on the 
Original and Prepossessed test dataset. 
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Figure 3.1 Proposed Models for Feature Extraction based Classification Techniques 

3.1 Proposed Algorithm 

Basically the four steps are used in this framework which is given below: 
Stepl: Apply classification techniques using Naive Bayes, Bagging, Boosting, Stacking and 148 on Dataset 
DSO for categorizing of different network attacks namely Normal, Probe, DoS, U2R and R2L. 
Step2: Apply feature extraction through Information Gain Attribute Ranking technique to reduce the dimension 
of Original Dataset DSO for preprocessing which produce Dataset DS 1 . 

Step3: Apply classification techniques using Naive Bayes, Bagging, Boosting, Stacking and J48 on Dataset 
DS1 for categorizing of different network attacks namely Normal, Probe, DoS, U2R and R2L. 
Step4: Comparison of classification results between Original and Preprocessed dataset. 

IV. EXPERIMENTAL SETUP 

4.1 Experiment Design 

Classifiers used for the experiments are Naive Bayes, Bagging, Boosting, Stacking, and J48. Two 
dataset created from NSL-KDD dataset are used as input. To conduct experiments a WEKA tool is used which 
contains implementation of various machine learning algorithms used for data mining. RunWEKA.ini fde is 
edited to assign 1.5 GB of memory to WEKA in order to handle large volume of data. 

4.2 Evaluation metrics 

Metrics which are mainly used to evaluate the performance of classifier are present in [38]. 
The True Positives (TP) and True Negatives (TN) are correct classifications. 

A False Positive (FP) occurs when the outcome is incorrectly predicted as yes (or positive) when it is actually 
no (negative). 

A False Negative (FN) occurs when the outcome is incorrectly predicted as negative when it is actually 
positive. 

Probability of Detection (PD)/Recall: The percentage of the total relevant documents in a database retrieved 
by the search. If you knew that there were 1000 relevant documents in a database and your search retrieved 100 
of these relevant documents, your recall would be 10%. 

Total _Detected _Attacked 
PD/Recall= *100 

Total _Attacks 
Recall = TP/(TP+FN) 

False Alarm Rate (FAR): The percentage of false alarms given the event did not occurred. 

Total_Misclassified_Process 
FAR = *100 

To tal_Normal_Proce s s 
False Alarm rate = FP/(TN+FN) 
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Precision: The percentage of relevant documents in relation to the number of documents retrieved. If your 
search retrieves 100 documents and 20 of these are relevant, your precision is 20%. 
Precision = TP/ (TP+FP) 

F-measure: The harmonic mean of precision and recall 
F = 2 * Recall * Precision / (Recall + Precision) 

The True Positive rate is TP divided by the total number of positives, which is TP + FN. 
The False Positive rate is FP divided by the total number of negatives, FP + TN. 





Actual Result 


Intrusion 


Normal 


Predicted 
Result 


Intrusion 


True Positive (TP) 


False Positive (FP) 


Normal 


False Negative (FN) 


True Negative (TN) 



Fig 4.1 Predicted Classes 

4.3 Preprocessing of data 

It has been found that model generation is computation intensive. So in order to reduce time redundant 
attributes can be removed, (which may also insert noise in the task of classification) by various feature selection 
algorithms. In this work summarizes the feature selection algorithms and the search method used to generate 
dataset DS 1 from original dataset. 



Dataset 


Feature Selection 
Algorithms 


Search 
Metho 
d 


No. of 
Attributes 


DsO 


Original dataset 


None 


42 


Dsl 


InfoGainAttributeE 
val 


Ranker 


24 



Table 4.1 Data set Generation 



V. RESULTS 

The comparative results analysis of Classification Techniques using F-Measure is given in this section: 
5.1 Naive Bayes Classifie rs for DSO 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.71 


0.97 


0.81 


Probe 


0.71 


0.63 


0.64 


DoS 


0.98 


0.96 


0.97 


U2R 


0.32 


0.43 


0.24 


R2L 


0.43 


0.45 


0.65 



Table 5.1 Results of Naive Bayes Classifiers for DSO 



Table 5.1 shows that Result of Naive Bayes classifier detects the highest possibility of DoS attack and lowest 
possibility of U2R attack, displayed using bar chart in Figure 5.1. 



Naive Bayes Classifiers for Dataset DSO 




Types of Class 



I Types of Class 



Figure 5.1 Results of Naive Bayes Classifiers for Dataset DSO 
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5.2 Naive Bayes Classifiers for DS1 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.78 


0.77 


0.78 


Probe 


0.68 


0.28 


0.92 


DoS 


0.89 


0.97 


0.94 


U2R 


0.66 


0.18 


0.45 


R2L 


0.07 


0.87 


0.54 



Table 5.2 Results of Naive Bayes Classifiers for DS1 



Table 5.2 shows that Result Naive Bayes classifier detects the highest possibility of DoS attack and 
lowest possibility of U2 R attack, displayed using bar chart in following Figure 5.2. 



Naive Bays Classifiers for Dataset DS1 




I Types of Class 



Types of Class 



Figure 5.2 Results of Naive Bayes Classifiers for Dataset DS1 
5.3 Bagging Classifiers for DSO 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.73 


0.99 


0.84 


Probe 


0.92 


0.74 


0.82 


DoS 


0.99 


0.97 


0.95 


U2R 


0.68 


0.57 


0.25 


R2L 


0.98 


0.56 


0.23 



Table 5.3 Results of Bagging Classifiers for DSO 
Table 5.3 shows that Result of Bagging classifier detects the highest possibility of DoS attack and 
lowest possibility of R2L attack, displayed using bar chart in following Figure 5.3. 



Bagging Classifiers for DSO 




I Type of Class 



Types of Class 



Figure 5.3 Results of Bagging Classifiers for Dataset DSO 
5.4 Bagging Classifiers for DS1 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.75 


0.97 


0.83 


Probe 


0.73 


0.64 


0.82 


DoS 


0.98 


0.97 


0.96 


U2R 


0.18 


0.18 


0.25 


R2L 


0.14 


0.79 


0.23 



Table 5.4 Results of Bagging Classifiers for DS 1 
Table 5.4 shows that Result of Bagging classifier detects the highest possibility of DoS attack and 
lowest possibility of R2L attack, displayed using bar chart in following Figure 5.4. 
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Figure 5.4 Results of Bagging Classifiers for Dataset DS1 
5.5 Boosting Classifiers for DSO 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.75 


0.91 


0.87 


Probe 


0.73 


0.74 


0.74 


DoS 


0.95 


0.96 


0.96 


U2R 


0.45 


0.13 


0.26 


R2L 


0.84 


0.16 


0.41 



Table 5.5 Results of Boosting Classifiers for DSO 
Table 5.5 shows that Result of Boosting classifier detects the highest possibility of DoS attack and 
lowest possibility of U2 R attack, displayed using bar chart in following Figure 5.5. 

Boosting Classifiers for DSO 

1.2 -, 




Types of Class 



Figure 5.5 Results of Boosting Classifiers for Dataset DSO 
5.6 Boosting Classifiers for DS1 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.78 


0.77 


0.77 


Probe 


0.68 


0.28 


0.39 


DoS 


0.89 


0.97 


0.93 


U2R 


0.66 


0.18 


0.34 


R2L 


0.75 


0.71 


0.66 



Table 5.6 Results of Boosting for DS1 
Table 5.6 shows that Result of Boosting classifier detects the highest possibility of DoS attack and 
lowest possibility of U 2R attack, displayed using bar chart in following Figure 5.6. 

Boosting Classifiers for DS1 




■ Type of Class 



Types of Class 



Figure 5.6 Results of Boosting Classifiers for Dataset DS 1 



33 



Feature Extraction Based Classification Technique For Intrusion Detection System 



5.7 Stacking Classifiers fo r DSO 



Class 


Pre cision 


Recall 


F-IYTp asiirf 

M. f IVlliJUl V' 


Normal 


0.73 


0.98 


0.83 


Probe 


0.71 


0.81 


0.76 


DoS 


0.99 


0.97 


0.98 


U2R 


0.52 


0.16 


0.34 


R2L 


0.88 


0.07 


0.24 



Table 5.7 Results of Stacking Classifiers for DSO 
Table 5.7 shows that Result of Stacking classifier detects the highest possibility of DoS attack and 
lowest possibility of R 2L attack, displayed using bar chart in following Figure 5.7. 



Slacking Classifiers for DSO 
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Figure 5.7 Results of Stacking Classifiers for Dataset DSO 
5.8 Stacking Classifiers for DS1 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.74 


0.97 


0.84 


Probe 


0.71 


0.63 


0.67 


DoS 


0.98 


0.97 


0.98 


U2R 


0.75 


0.71 


0.75 


R2L 


0.68 


0.43 


0.81 



Table 5.8 Results of Stacking Classifiers for DS1 
Table 5.8 shows that Result of Stacking classifier detects the highest possibility of DoS attack and 
lowest possibility of Probe attack, displayed using bar chart in Figure 5.8. 

Stacking Classifiers for DS1 
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Figure 5.8 Results of Stacking Classifiers for Dataset DS 1 
5.9 J48 Classifiers for DSO 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.77 


0.98 


0.86 


Probe 


0.45 


0.56 


0.57 


DoS 


0.61 


0.54 


0.99 


U2R 


0.62 


0.16 


0.18 


R2L 


0.49 


0.57 


0.76 



Table 5.9 Results of J48 Classifiers for DSO 
Table 5.9 shows that Result of J48 classifier detects the highest possibility of DoS attack and lowest 
possibility of U2R attack, displayed using bar chart in following Figure 5.9. 
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J 48 Classifiers for DSO 
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Figure 5.9 Results of J48 Classifiers for Dataset DSO 



5. 10 J48 Classifiers for DS1 



Class 


Precision 


Recall 


F- 

Measure 


Normal 


0.81 


0.97 


0.88 


Probe 


0.49 


0.53 


0.51 


DoS 


0.12 


0.55 


0.75 


U2R 


0.34 


0.32 


0.57 


R2L 


0.71 


0.89 


0.46 



Table 5.10 Results of J48 Classifiers for DS1 
Table 5.10 shows that Result of J48 classifier detects the highest possibility of Normal attack and 
lowest possibility of R2 L attack, displayed using bar chart in following Figure 5.10. 
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Figure 5.10 Results of J48 Classifiers for Dataset DS1 



Class 


Naive 
Bayes 


Bagging 


Boosting 


Stacking 


J48 


Normal 


0.81 


0.84 


0.87 


0.83 


0.86 


Probe 


0.64 


0.82 


0.74 


0.76 


0.57 


DoS 


0.97 


0.95 


0.96 


0.98 


0.99 


U2R 


0.24 


0.25 


0.26 


0.34 


0.18 


R2L 


0.65 


0.23 


0.41 


0.24 


0.76 



Table 5.11 Result of Performance evaluation (F-Measure) of difference Classifiers for DSO 
Table 5.11 shows that performance evaluation, J48 classifier detects the highest possibility of DoS 
attack and Bagging Classifier shows lowest possibility of R2L attack for Original Dataset DSO, displayed using 
bar chart following Figure 5.11. 



Performance evaluation of difference 
Classifiers for DSO 



R2L 

U2R 

DoS 

Probe 
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Figure 5.11 Result of Performance evaluation (F-Measure) of difference Classifiers for DSO 
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5.12 Comparative Anal ysis for DS1 



Class 


Naive 
Bayes 


Bagging 


Boosting 


Stacking 


J48 


Normal 


0.78 


0.83 


0.77 


0.84 


0.88 


Probe 


0.92 


0.82 


0.39 


0.67 


0.51 


DoS 


0.94 


0.96 


0.93 


0.98 


0.75 


U2R 


0.45 


0.25 


0.34 


0.75 


0.57 


R2L 


0.54 


0.23 


0.66 


0.81 


0.46 



Table 5.12 Result of Performance evaluation (F-Measure) of difference Classifiers for DS1 
Table 5.12 shows that performance evaluation, Stacking classifier detects the highest possibility of DoS 
attack and Bagging Classifier shows lowest possibility of R2L attack for Preprocessed Dataset DS1, displayed 
using bar chart following Figure 5.12. 




Figure 5.12 Result of Performance evaluation (F-Measure) of difference Classifiers for DS1 
The conclusion and feature scope of discussed in the next. 

VI. CONCLUSION AND FUTURE SCOPE 

This research is approached to discover the best performance of classification algorithm for intrusion 
detection. The evaluation of two types of dataset Original and Prepossessed with the different network attacks 
namely Normal, Probe, DoS, U2R and R2L. Prepossessed dataset obtained to reduction of the features using 
information gain technique. The experiment results show that J48 classifier detects the highest possibility of 
DoS attack and Bagging Classifier shows lowest possibility of R2L attack for Original Dataset DS0, Whereas 
Stacking classifier detects the highest possibility of DoS attack and Bagging Classifier shows lowest possibility 
of R2L attack for Preprocessed Dataset DS 1 . 

In the present study few issues like high dimensionality, Scalability and accuracy are focused but there 
are still many issues that can be taken into consideration for further research which are as different algorithms 
which are not included in WEKA can be tested. Also, experiments with various feature selection techniques 
should be compared. Classification technique of data mining is useful in every domain of our life e.g. University 
domain category wise, Medical domain, crime domain, Auto Price, Zoo etc. Cost based classifier can be applied 
to IDS which keeps track of cost matrix which contains cost of misclassification. Classifier combination which 
has Trees based classifier. As demonstrated in the result section Trees gives best precision (equal to one) for the 
normal class the packets classified as normal are declared normal. For the rest of instances i.e. instances 
classified as attack we can use some good classifier at another level. 
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